Search CORE

76 research outputs found

Komunikazioaren teoriaren oinarriak

Author: Hernáez Rioja Inmaculada
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2019
Field of study

Liburu hau Bilboko Ingeniaritza Eskolan (Euskal Herriko Unibertsitatea UPV/EHU) Telekomunikazio Teknologiaren Ingeniaritzako Gradu titulazioan irakasten den Komunikazioaren Teoria irakasgaiaren klase magistraletan gidaliburua da. Edukiak, beraz, irakaskuntza plan berria ezarri zenetik, klase magistralak irakasten dituzten irakasleek diseinatutako edukiarekin bat datoz. Komunikazioaren Teoria irakasgaiak telekomunikazioen oinarrizko kontzeptuak lantzen ditu. Horrela, ikuspuntu formal eta matematikoa abiapuntu, telekomunikazio sistema modernoetan informazioa transmititzen dituzten oinarrizko mekanismoak deskribatzen ditu (irrati eta telebista digitala, datu-transmisioa, telefono bidezko komunikazioak, eta abar)

Archivo Digital para la Docencia y la Investigación

Fundamentos de teoría de la comunicación

Author: Hernáez Rioja Inmaculada
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2019
Field of study

135 p.Este libro es una guía para las clases magistrales del curso de Teoría de la Comunicación impartido en la titulación de Grado de Tecnología de Ingeniería de Telecomunicación, en la Escuela de Ingeniería de Bilbao (Universidad del País Vasco/Euskal Herriko Unibertsitatea). El contenido, por tanto, es el diseñado por el profesorado responsable de las clases magistrales, desde la implantación del actual plan de estudios. La asignatura de Teoría de la Comunicación describe, desde un punto de vista formal y matemático, los mecanismos básicos que permiten realizar la transmisión de la información en los sistemas de telecomunicación modernos (radio y televisión digital, transmisión de datos, comunicaciones telefónicas etc.)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Komunikazioaren teoriaren oinarriak

Author: Hernáez Rioja Inmaculada Concepción
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2019
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Evaluation of Tacotron Based Synthesizers for Spanish and Basque

Author: García Romillo Víctor
Hernáez Rioja Inmaculada
Navas Cordón Eva
Publication venue: 'MDPI AG'
Publication date: 01/02/2022
Field of study

In this paper, we describe the implementation and evaluation of Text to Speech synthesizers based on neural networks for Spanish and Basque. Several voices were built, all of them using a limited number of data. The system applies Tacotron 2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. The limited number of data used for training the models leads to synthesis errors in some sentences. To automatically detect those errors, we developed a new method that is able to find the sentences that have lost the alignment during the inference process. To mitigate the problem, we implemented a guided attention providing the system with the explicit duration of the phonemes. The resulting system was evaluated to assess its robustness, quality and naturalness both with objective and subjective measures. The results reveal the capacity of the system to produce good quality and natural audios.This work was funded by the Basque Government (Project refs. PIBA 2018-035, IT-1355-19). This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/ 501100011033

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

Modelo de duración para conversión de texto a voz en euskera

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Sánchez de la Fuente Jon
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2002
Field of study

En este artículo se presenta el trabajo realizado en el modelado de la duración de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de predicción testeando diferentes factores de influencia. El resultado obtenido en la predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be included in a text-to-speech system. The statistical modelling has been done using binary regression trees and a large corpus containing 57.300 phones. Several experiments have been performed, testing different sets of predicting factors. The result when predicting durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2000-1005-C03-03 y TIC2000-1669-C04-03)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Secretaría de Estado de Cultura

Enrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Target

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Raman Sneha
Sarasola Aramendia Xabier
Publication venue: 'MDPI AG'
Publication date: 08/07/2021
Field of study

Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.This project was supported by funding from the European Union’s H2020 research and innovation programme under the MSCA GA 675324 (the ENRICH network: www.enrich-etn.eu (accessed on 25 June 2021)), and the Basque Government (PIBA_2018_1_0035 and IT355-19)

Archivo Digital para la Docencia y la Investigación

Intelligibility and Listening Effort of Spanish Oesophageal Speech

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Raman Sneha
Serrano García Luis
Winneke Axel
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Intelligibility and Listening Effort of Spanish Oesophageal Speech

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Raman Sneha
Serrano García Luis
Winneke Axel
Publication venue: 'MDPI AG'
Publication date: 01/08/2019
Field of study

Archivo Digital para la Docencia y la Investigación

Frame-Based Phone Classification Using EMG Signals

Author: De Zuazo Oteiza Xabier
Del Blanco Sierra Eder
Hernáez Rioja Inmaculada
Navas Cordón Eva
Salomons Inge
Publication venue: MDPI
Publication date: 13/07/2023
Field of study

This paper evaluates the impact of inter-speaker and inter-session variability on the development of a silent speech interface (SSI) based on electromyographic (EMG) signals from the facial muscles. The final goal of the SSI is to provide a communication tool for Spanish-speaking laryngectomees by generating audible speech from voiceless articulation. However, before moving on to such a complex task, a simpler phone classification task in different modalities regarding speaker and session dependency is performed for this study. These experiments consist of processing the recorded utterances into phone-labeled segments and predicting the phonetic labels using only features obtained from the EMG signals. We evaluate and compare the performance of each model considering the classification accuracy. Results show that the models are able to predict the phonetic label best when they are trained and tested using data from the same session. The accuracy drops drastically when the model is tested with data from a different session, although it improves when more data are added to the training data. Similarly, when the same model is tested on a session from a different speaker, the accuracy decreases. This suggests that using larger amounts of data could help to reduce the impact of inter-session variability, but more research is required to understand if this approach would suffice to account for inter-speaker variability as well.This research was funded by Agencia Estatal de Investigación grant number ref.PID2019-108040RB-C21/AEI/10.13039/50110001103

Archivo Digital para la Docencia y la Investigación

Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

Author: Alonso Agustin
García Romillo Víctor
Hernáez Rioja Inmaculada
Navas Cordón Eva [
Sánchez de la Fuente Jon
Publication venue: 'MDPI AG'
Publication date: 01/02/2022
Field of study

Speech is the most common way of communication among humans. People who cannot communicate through speech due to partial of total loss of the voice can benefit from Alternative and Augmentative Communication devices and Text to Speech technology. One problem of using these technologies is that the included synthetic voices might be impersonal and badly adapted to the user in terms of age, accent or even gender. In this context, the use of synthetic voices from voice banking systems is an attractive alternative. New voices can be obtained applying adaptation techniques using recordings from people with healthy voice (donors) or from the user himself/herself before losing his/her own voice. In this way, the goal is to offer a wide voice catalog to potential users. However, as there is no control over the recording or the adaptation processes, some method to control the final quality of the voice is needed. We present the work developed to automatically select the best synthetic voices using a set of objective measures and a subjective Mean Opinion Score evaluation. A prediction algorithm of the MOS has been build which correlates similarly to the most correlated individual measure.This work has been funded by the Basque Government under the project ref. PIBA 2018-035 and IT-1355-19. This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/501100011033

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación